129 research outputs found

    Beyond shared memory loop parallelism in the polyhedral model

    Get PDF
    2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research. Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling. Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model. Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity. Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming

    Automatic creation of tile size selection models using neural networks

    Get PDF
    2010 Spring.Includes bibliographic references (pages 54-59).Covers not scanned.Print version deaccessioned 2022.Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across the tested architecture-compiler combinations

    Liveness Analysis in Explicitly-Parallel Programs

    Get PDF
    International audienceIn this paper, we revisit scalar and array element-wise liveness analysis for programs with parallel specifications. In earlier work on memory allocation/contraction (register allocation or intra- and inter-array reuse in the polyhedral model), a notion of ``time'' or a total order among the iteration points was used to compute the liveness of values. In general, the execution of parallel programs is not a total order, and hence the notion of time is not applicable. We first revise how conflicts are computed by using ideas from liveness analysis for register allocation, studying the structure of the corresponding conflict/interference graphs. Instead of considering the conflict between two live ranges, we only consider the conflict between a live range and a write. This simplifies the formulation from having four instances involved in the test down to three, and also improves the precision of the analysis in the general case. Then we extend the liveness analysis to work with partial orders so that it can be applied to many different parallel languages/specifications with different forms of parallelism. An important result is that the complement of the conflict graph with partial orders is directly connected to memory reuse, even in presence of races. However, programs with conditionals do not always define a partial order, and our next step will be to handle such cases with more accuracy

    Study on high pressure design of contra-rotating small hydroturbine

    Get PDF
    It is requested that the hydroturbine is small size and high performance. Therefore, we adopted contra-rotating rotors, which can be expected to achieve small size and high performance. However, when the rotors become smaller, the output power of conventional contra-rotating rotors, which is composed of axial flow rotors, is very low. In order to achieve small size and high output power, we propose new type contra-rotating rotors, which is composed of a hybrid rotor and a centrifugal rotor. In this research, we investigate performance of the new type contra-rotating small hydroturbine model by the numerical analysis. We report on the influence of deflection angle on the performance and internal flow condition

    Technique étendue d’allocation mémoire basée sur les réseaux entiers

    Get PDF
    This work extends lattice-based memory allocation, an earlier work on memory (array)reuse analysis. The main motivation is to handle in a better way the more general forms ofspecifications we see today, e.g., with loop tiling, pipelining, and other forms of parallelism availablein explicitly parallel languages. Our extension has two complementary aspects. We show howto handle more general specifications where conflicting constraints (those that describe the arrayindices that cannot share the same location) are specified as a (non-convex) union of polyhedra.Unlike convex specifications, this also requires to be able to choose suitable directions (or basis) ofarray reuse. For that, we extend two dual approaches, previously proposed for a fixed basis, intooptimization schemes to select suitable basis. Our final approach relies on a combination of thetwo, also revealing their links with, on one hand, the construction of multi-dimensional schedulesfor parallelism and tiling (but with a fundamental difference that we identify) and, on the otherhand, the construction of universal reuse vectors (UOV), which was only used so far in a specificcontext, for schedule-independent mapping.Ce travail étend l’allocation mémoire basée sur les réseaux entiersprécédemment proposée en analyse de réutilisation mémoire (de tableaux). Lamotivation principale est de traiter de meilleure façon les formes plus généralesde spécifications rencontrées aujourd’hui, comportant du tuilage de boucles,du pipeline, et d’autres formes de parallélisme exprimées dans les langages àparallélisme explicite. Notre extension a deux aspects complémentaires. Nousmontrons comment nous pouvons prendre en compte des spécifications plusgénérales où les contraintes de conflit (celles qui décrivent les indices de tableauxqui ne peuvent pas partager le même emplacement mémoire) sont spécifiées parune union (non-convexe) de polyèdres. Au contraire des spécifications convexes,ceci requiert d’être capable de choisir des directions (c’est-à-dire une base)adéquates de réutilisation des cases de tableaux. Pour cela, nous étendons deuxapproches duales, précédemment proposées pour une base fixée, en des schémasd’optimisation permettant de choisir des bases adaptées. Notre approche finaleconsiste en une combinaison des deux approches, révélant également des liensavec, d’une part, la construction d’ordonnancements multi-dimensionnels pour leparallélisme et le tuilage (avec une différence fondamentale que nous identifions)et, d’autre part, la construction de vecteurs de réutilisation universelle (UOV),qui étaient utilisés jusqu’à présent uniquement dans un contexte spécifique, celuides allocations valides pour tout ordonnancement

    Spéculation temporelle algorithmique pour accélérateurs de réseaux de neuro

    Get PDF
    In this paper, we propose a technique for improving the efficiency of hardwareaccelerators based on timing speculation (overclocking) and fault tolerance. We augment theaccelerator with a lightweight error detection mechanism to protect against timing errors, enablingaggressive timing speculation. We demonstrate the validity of our approach for the convolutionlayers in Convolutional Neural Networks (CNN). We present an implementation of a fault-tolerantCNN accelerator combined with the lightweight error detection for convolution layers. The errordetection mechanism we have developed works at the algorithm level, based on algebraic propertiesof the computation, allowing the full implementation to be realized using High-Level Synthesistools. We use a set of Zybo boards to experimentally demonstrate that overclocking boosts thefrequency by 17-36% with low chances of error, and that the infrequent errors can be detected witha negligible overhead (only 1000 LUTs)

    Nectin-2 is a potential target for antibody therapy of breast and ovarian cancers

    Get PDF
    BACKGROUND: Nectin-2 is a Ca(2+)-independent cell-cell adhesion molecule that is one of the plasma membrane components of adherens junctions. However, little has been reported about the involvement of Nectin-2 in cancer. METHODS: To determine the expression of Nectin-2 in cancer tissues and cancer cell lines, we performed gene expression profile analysis, immunohistochemistry studies, and flow cytometry analysis. We also investigated the potential of this molecule as a target for antibody therapeutics to treat cancers by generating and characterizing an anti-Nectin-2 rabbit polyclonal antibody (poAb) and 256 fully human anti-Nectin-2 monoclonal antibodies (mAbs). In addition, we tested anti-Nectin-2 mAbs in several in vivo tumor growth inhibition models to investigate the primary mechanisms of action of the mAbs. RESULTS: In the present study, we found that Nectin-2 was over-expressed in clinical breast and ovarian cancer tissues by using gene expression profile analysis and immunohistochemistry studies. Nectin-2 was over-expressed in various cancer cell lines as well. Furthermore, the polyclonal antibody specific to Nectin-2 suppressed the in vitro proliferation of OV-90 ovarian cancer cells, which express endogenous Nectin-2 on the cell surface. The anti-Nectin-2 mAbs we generated were classified into 7 epitope bins. The anti-Nectin-2 mAbs demonstrated antibody-dependent cellular cytotoxicity (ADCC) and epitope bin-dependent features such as the inhibition of Nectin-2-Nectin-2 interaction, Nectin-2-Nectin-3 interaction, and in vitro cancer cell proliferation. A representative anti-Nectin-2 mAb in epitope bin VII, Y-443, showed anti-tumor effects against OV-90 cells and MDA-MB-231 breast cancer cells in mouse therapeutic models, and its main mechanism of action appeared to be ADCC. CONCLUSIONS: We observed the over-expression of Nectin-2 in breast and ovarian cancers and anti-tumor activity of anti-Nectin-2 mAbs via strong ADCC. These findings suggest that Nectin-2 is a potential target for antibody therapy against breast and ovarian cancers

    Model-Driven Engineering and Optimizing Compilers: A bridge too far?

    Get PDF
    International audienceA primary goal of Model Driven Engineering (MDE) is to reduce the cost and effort of developing complex software systems using techniques for transforming abstract views of software to concrete implementations. The rich set of tools that have been developed, especially the growing maturity of model transformation technologies, opens the possibility of applying MDE technologies to transformation-based problems in other domains. In this paper, we present our experience with using MDE technologies to build and evolve compiler infrastructures in the optimizing compiler domain.We illustrate, through our two ongoing research compiler projects for C and a functional language, the challenging aspects of optimizing compiler research and show how mature MDE technologies can be used to address them.We also identify some of the pitfalls that arise from unrealistic expectations of what can be accomplished using MDE and discuss how they can lead to unsuccessful and frustrating application of MDE technologies

    A clinicopathological study of perineural invasion and vascular invasion in oral tongue squamous cell carcinoma

    Get PDF
    The risk factors for recurrence of head and neck cancer are classified as being of high or intermediate risk. Those of intermediate risk include multiple positive nodes without extracapsular nodal spread, perineural/vascular invasion, pT3/T4 primary tumours, and positive level IV/V nodes. However, little evidence is available to validate these intermediate risk factors. We analyzed perineural/vascular invasion in 89 patients who underwent radical surgery for oral tongue squamous cell carcinoma, whose records were reviewed retrospectively. Perineural invasion was found in 27.0% of cases and vascular invasion in 23.6%; both had a strong relationship with histopathological nodal status (P = 0.005). The 5-year disease-specific survival (DSS) and overall survival rates of patients with perineural invasion were significantly lower than those of patients without perineural invasion (P < 0.001 and P = 0.002, respectively). The 5-year DSS of UICC stage I and II cases with perineural/vascular invasion was significantly lower than those without (P < 0.001 and P = 0.008, respectively). Perineural invasion and vascular invasion are risk factors for regional metastasis and a poor prognosis. We recommend elective neck dissection when perineural/vascular invasion is found in clinical stage I and II cases. The accumulation of further evidence to consider intermediate risks is required
    • …
    corecore